While these are some of my personal favorite artists—the ones that define my everyday playlists and late-night loops—this project takes a broader lens. I’ll be exploring Spotify data to analyze trends across various artists, genres, and tracks to uncover the characteristics that make a song stand out and rise to popularity. It’s less about personal taste, and more about what the data reveals about music that resonates with millions
Intro Track 🎙️
Welcome to Tracks and Stats — where the world of music meets the power of data!
In this project, we dive into Spotify’s data exports to uncover the trends and characteristics behind today’s most popular songs. Using real-world analytics, we work toward building The Ultimate Playlist — a collection inspired by the legendary spirit of Mr. Barney Stinson’s “All Rise” playlist, but customized to reflect our own vision of what ultimate music sounds like.
Through the lens of Tracks and Stats, we explore top artists, trending songs, and musical patterns that define what makes a track truly unforgettable.
Main Chorus 🎶
Drop the Beat (Data Ingest & Cleaning): Start by loading and cleaning up Spotify’s raw files — just like a DJ fine-tunes their setlist, we’re only keeping the clean stuff that’s ready to play.
Mashup Mode (Data Combination): Merge different datasets to create a complete picture. One source gives us a glimpse, but combining them? That’s where the real story comes out.
Now Playing: Trends (Descriptive Analysis): Time to break down what the data’s actually saying — who’s trending, what’s gaining traction, and what stands out.
Visual Bops (Data Visualization): Turn those stats into visuals that speak louder than numbers ever could. The goal? Charts that actually hit.
Remix the Hits (Inferential Modeling): Use models to dig deeper and figure out what really makes a song pop — and from there, build The Ultimate Playlist.
Data Soundcheck 📀
In this section, I’ve loaded and cleaned two datasets: one with song characteristics like acousticness and tempo, and another with playlist details such as track order and metadata. These datasets are now ready for analysis.
Track Traits 🎸
This section displays a dataset of songs with key traits like acousticness, danceability, tempo, loudness, popularity, and more. Each row represents a song along with its ID, release date & year, artist and various other characteristics
library(DT)datatable(head(songs, 100),options =list(pageLength =10,autoWidth =TRUE,dom ='lfrtip' ),rownames =FALSE,class ='cell-border stripe hover') %>%formatStyle(columns =names(songs),backgroundColor ='#121212', # Black backgroundcolor ='#00FF00'# Bright green text ) %>% htmlwidgets::onRender(" function(el, x) { // Change text color of 'Show entries' label $(el).parent().find('label').css('color', '#00FF00'); // Change text color inside 'Search' input $(el).parent().find('input[type=search]').css('color', '#00FF00'); $(el).parent().find('input[type=search]').css('background-color', '#121212'); // Also adjust the dropdown box $(el).parent().find('select').css('color', '#00FF00'); $(el).parent().find('select').css('background-color', '#121212'); } ")
Playlist Breakdown 🎧
This section showcases a dataset of playlists, detailing the songs included, their order, and additional metadata like artist names and track attributes. It offers a comprehensive view of how different songs are grouped in playlists, a.k.a ‘Rectangle’ the Playlist Data.
Show the code
# Save the raw playlistssaveRDS(playlists_raw, "data/playlists_raw.rds")
Show the code
library(DT)# Show the first 100 rows of rectified_data in a styled tabledatatable(head(rectified, 100),options =list(pageLength =10,autoWidth =TRUE,dom ='lfrtip' ),rownames =FALSE,class ='cell-border stripe hover') %>%formatStyle(columns =names(rectified),backgroundColor ='#121212', # black backgroundcolor ='#00FF00'# bright green text ) %>% htmlwidgets::onRender(" function(el, x) { $(el).parent().find('label').css('color', '#00FF00'); $(el).parent().find('input[type=search]').css('color', '#00FF00'); $(el).parent().find('input[type=search]').css('background-color', '#121212'); $(el).parent().find('select').css('color', '#00FF00'); $(el).parent().find('select').css('background-color', '#121212'); } ")
Exploring the Tracks 🔍 🎵
How many distinct tracks and artists are represented in the playlist data?
Show the code
library(dplyr)library(DT)# 1A: How many distinct tracks?distinct_tracks <-n_distinct(rectified$track_id)# 1B: How many distinct artists?distinct_artists <-n_distinct(rectified$artist_id)# Create a small tibble to display both answersq1_table <- tibble::tibble(Metric =c("Distinct Tracks", "Distinct Artists"),Count =c(distinct_tracks, distinct_artists))# Now display the table nicely with your themedatatable( q1_table,options =list(pageLength =10,autoWidth =TRUE,dom ='lfrtip' ),rownames =FALSE,class ='cell-border stripe hover') %>%formatStyle(columns =names(q1_table),backgroundColor ='#121212',color ='#00FF00' ) %>% htmlwidgets::onRender(" function(el, x) { $(el).parent().find('label').css('color', '#00FF00'); $(el).parent().find('input[type=search]').css('color', '#00FF00'); $(el).parent().find('input[type=search]').css('background-color', '#121212'); $(el).parent().find('select').css('color', '#00FF00'); $(el).parent().find('select').css('background-color', '#121212'); } ")
What are the 5 most popular tracks in the playlist data?
Show the code
library(dplyr)library(DT)# Q2: Get top 5 most popular trackstop5_tracks <- rectified %>%count(track_name, sort =TRUE) %>%slice_head(n =5)# Display the result nicely with your dark themedatatable( top5_tracks,options =list(pageLength =5,autoWidth =TRUE,dom ='lfrtip' ),rownames =FALSE,class ='cell-border stripe hover') %>%formatStyle(columns =names(top5_tracks),backgroundColor ='#121212',color ='#00FF00' ) %>% htmlwidgets::onRender(" function(el, x) { $(el).parent().find('label').css('color', '#00FF00'); $(el).parent().find('input[type=search]').css('color', '#00FF00'); $(el).parent().find('input[type=search]').css('background-color', '#121212'); $(el).parent().find('select').css('color', '#00FF00'); $(el).parent().find('select').css('background-color', '#121212'); } ")
What is the most popular track in the playlist data that does not have a corresponding entry in the song characteristics data?
Show the code
library(dplyr)library(DT)# Find the most popular tracks from rectified datapopular_tracks <- rectified %>%count(track_name, sort =TRUE)# Find tracks in playlists but missing from songs datasettracks_not_in_songs <-anti_join( popular_tracks, songs,by =c("track_name"="name"))# Get the most popular missing trackmost_popular_missing_track <- tracks_not_in_songs %>%slice_head(n =1)# Display it nicelydatatable( most_popular_missing_track,options =list(pageLength =1,autoWidth =TRUE,dom ='lfrtip' ),rownames =FALSE,class ='cell-border stripe hover') %>%formatStyle(columns =names(most_popular_missing_track),backgroundColor ='#121212',color ='#00FF00' ) %>% htmlwidgets::onRender(" function(el, x) { $(el).parent().find('label').css('color', '#00FF00'); $(el).parent().find('input[type=search]').css('color', '#00FF00'); $(el).parent().find('input[type=search]').css('background-color', '#121212'); $(el).parent().find('select').css('color', '#00FF00'); $(el).parent().find('select').css('background-color', '#121212'); } ")
According to the song characteristics data, what is the most “danceable” track? How often does it appear in a playlist?
library(dplyr)library(DT)# Get the most danceable track name from songsmost_danceable_track_name <- songs %>%arrange(desc(danceability)) %>%slice_head(n =1) %>%pull(name)# Count how many times it appears in playlists (rectified data)most_danceable_appearance_count <- rectified %>%filter(track_name == most_danceable_track_name) %>%nrow()# Create a tibble to display the resultq4_appearance_table <- tibble::tibble(Most_Danceable_Track = most_danceable_track_name,Number_of_Appearances = most_danceable_appearance_count)# Display it datatable( q4_appearance_table,options =list(pageLength =1,autoWidth =TRUE,dom ='lfrtip' ),rownames =FALSE,class ='cell-border stripe hover') %>%formatStyle(columns =names(q4_appearance_table),backgroundColor ='#121212',color ='#00FF00' ) %>% htmlwidgets::onRender(" function(el, x) { $(el).parent().find('label').css('color', '#00FF00'); $(el).parent().find('input[type=search]').css('color', '#00FF00'); $(el).parent().find('input[type=search]').css('background-color', '#121212'); $(el).parent().find('select').css('color', '#00FF00'); $(el).parent().find('select').css('background-color', '#121212'); } ")
According to the song characteristics data, the most danceable track is “Funky Cold Medina”. However, it has only one appearance in the playlists from the playlist dataset.
Which playlist has the longest average track length?
Q1 Is the popularity column correlated with the number of playlist appearances? If so, to what degree?
Show the code
library(ggplot2)library(dplyr)# Create data frame with track popularity and playlist appearance countspopularity_vs_appearances <- rectified %>%group_by(track_id, popularity) %>%summarise(playlist_appearances =n(), .groups ="drop")# Step 1: Bin playlist appearance countsbinned_data <- popularity_vs_appearances %>%mutate(bin =cut( playlist_appearances,breaks =c(0, 5, 10, 20, 50, 100, Inf),labels =c("0–5", "6–10", "11–20", "21–50", "51–100", "100+") ))# Step 2: Visualize the relationshipp <-ggplot(binned_data, aes(x = bin, y = popularity)) +geom_boxplot(fill ="#00FF00", color ="white") +theme_minimal(base_size =14) +theme(plot.background =element_rect(fill ="#121212", color =NA),panel.background =element_rect(fill ="#121212", color =NA),panel.grid.major =element_line(color ="white"),axis.text =element_text(color ="#00FF00"),axis.title =element_text(color ="#00FF00"),plot.title =element_text(color ="#00FF00", size =16, face ="bold") ) +labs(title ="Track Popularity by Playlist Appearance Count",x ="Number of Playlist Appearances (Binned)",y ="Track Popularity" )print(p)
Analysis - This boxplot shows that tracks appearing in more playlists generally have higher Spotify popularity scores. Median popularity rises with playlist appearances, especially beyond 20. Still, the wide range within each group suggests that playlist exposure helps, but is not the only factor driving popularity.
In what year were the most popular songs released?
Show the code
library (dplyr)# Filter and count popular songs by yearpopular_songs <- rectified %>%filter(popularity >=75, !is.na(year))yearly_counts <- popular_songs %>%count(year) %>%filter(year >=1995)
Show the code
s <-ggplot(yearly_counts, aes(x =factor(year), y = n)) +geom_col(fill ="#00FF00", alpha =0.8) +labs(title ="Popular Songs by Year",x ="Year",y ="Number of Popular Songs" ) +theme_minimal(base_size =14) +theme(plot.background =element_rect(fill ="#121212", color =NA),panel.background =element_rect(fill ="#121212", color =NA),panel.grid.major =element_line(color ="white"),axis.text =element_text(color ="#00FF00"),axis.text.x =element_text(angle =45, hjust =1), # 💡 Rotate x-axis labelsaxis.title =element_text(color ="#00FF00"),plot.title =element_text(color ="#00FF00", size =16, face ="bold") )print (s)
Analysis - Most popular songs were released after 2015, with a sharp rise in recent years. This aligns with how newer music dominates playlists due to recency bias and platform algorithms.
In what year did danceability peak?
Analysis - Danceability of popular songs steadily increased over time, peaking around 2019–2020. This reflects the shift toward more rhythm-driven, upbeat music — possibly influenced by streaming-era pop, TikTok trends, and global dance tracks dominating the charts.
Which decade is most represented on user playlists?
Analysis - The 2010s dominate user playlists, followed by the 2000s. This reflects user preferences skewing toward more recent decades, likely due to recency, nostalgia, and streaming platform curation that favors modern music.
Polar (circular) coordinates
Show the code
library(ggplot2)library(dplyr)# Updated readable labels (slash format, no flats)key_labels <-c("C", "C#/Db", "D", "D#/Eb", "E", "F", "F#/Gb", "G", "G#/Ab", "A", "A#/Bb", "B")# Count key frequencieskey_counts <- rectified %>%count(key) %>%mutate(key_label =factor(key_labels[key +1], levels = key_labels) )# Polar plotc <-ggplot(key_counts, aes(x = key_label, y = n, fill = key_label)) +geom_bar(stat ="identity", color ="white", width =1) +coord_polar(start =0) +scale_fill_manual(values =rep("#00FF00", 12)) +labs(title ="Distribution of Musical Keys",x =NULL,y ="Number of Songs" ) +theme_minimal(base_size =14) +theme(plot.background =element_rect(fill ="#121212", color =NA),panel.background =element_rect(fill ="#121212", color =NA),panel.grid.major =element_line(color ="white", linewidth =0.3),panel.grid.minor =element_blank(),axis.text.y =element_text(color ="#00FF00"),axis.text.x =element_text(color ="#00FF00", size =12, vjust =-0.8),axis.title.y =element_text(color ="#00FF00"),plot.title =element_text(color ="#00FF00", size =16, face ="bold"),legend.position ="none" )print (c)
Analysis - The distribution of track lengths shows a clear peak between 3 to 4 minutes (around 180–240 seconds), with the most popular duration near 210 seconds. This suggests that mid-length songs are the most playlist-friendly—long enough to feel complete but short enough to hold attention. Tracks that are very short or very long appear far less frequently, likely because they’re either intros/skits or extended versions not suited for general playlists.
What are the most popular track lengths? (Are short tracks, long tracks, or something in between most commonly included in user playlists
Show the code
library(ggplot2)library(dplyr)# Convert milliseconds to seconds lengths_data <- rectified %>%mutate(duration_sec = duration /1000) %>%filter(duration_sec >=60, duration_sec <=600) # Keep songs between 1–10 min# Histogram of track lengthst <-ggplot(lengths_data, aes(x = duration_sec)) +geom_histogram(fill ="#00FF00", color ="white", binwidth =10) +labs(title ="Distribution of Track Lengths in Playlists",x ="Track Length (seconds)",y ="Number of Tracks" ) +theme_minimal(base_size =14) +theme(plot.background =element_rect(fill ="#121212", color =NA),panel.background =element_rect(fill ="#121212", color =NA),panel.grid.major =element_line(color ="white", linewidth =0.3),axis.text =element_text(color ="#00FF00"),axis.title =element_text(color ="#00FF00"),plot.title =element_text(color ="#00FF00", size =16, face ="bold") )print(t)
Analysis - Most tracks in user playlists fall between 3 to 4 minutes, with a noticeable peak around 210 seconds. This suggests that songs of moderate length are the most commonly included—probably because they strike the right balance between listener attention and playlist flow. Very short or long tracks are much less frequent, indicating that mid-length songs tend to be the sweet spot for popularity.
Two more visualization
Additional Exploratory questions
1️⃣ Are explicit songs more or less popular than non-explicit songs?
Show the code
library(ggplot2)library(dplyr)# Convert 'explicit' to labelcombined_data <- combined_data %>%mutate(explicit_label =ifelse(explicit.x ==1, "Explicit", "Non-Explicit"))# Boxplot comparisone <-ggplot(combined_data, aes(x = explicit_label, y = popularity.x, fill = explicit_label)) +geom_boxplot(color ="white", width =0.5) +scale_fill_manual(values =c("Explicit"="#FF4500", "Non-Explicit"="#00FF00")) +labs(title ="Popularity of Explicit vs Non-Explicit Songs",x ="Song Type",y ="Popularity" ) +theme_minimal(base_size =14) +theme(plot.background =element_rect(fill ="#121212", color =NA),panel.background =element_rect(fill ="#121212", color =NA),panel.grid.major =element_line(color ="white", linewidth =0.3),axis.text =element_text(color ="#00FF00"),axis.title =element_text(color ="#00FF00"),plot.title =element_text(color ="#00FF00", size =16, face ="bold"),legend.position ="none" )print (e)
Analysis - Looks like keeping it clean might actually pay off — non-explicit songs seem to have a slight edge in popularity. While both types are doing decently well, the clean tracks have a higher median and a chunkier upper range. So yeah, maybe being radio-friendly isn’t such a bad thing after all.
2️⃣ Do popular songs tend to be more “positive” in valence?
Analysis - Turns out, popular songs do lean a little more positive, but it’s not a huge mood swing. The median valence score is just a bit higher for popular tracks, so while they’re vibing higher, it’s not all sunshine and rainbows. Basically, being a bop doesn’t always mean being bright and bubbly — but it sure doesn’t hurt.
From Anchor to Anthem: Crafting the Playlist 💿 ➡️ 🎧
Show the code
# Set anchor songsanchor_songs <- combined_data %>%filter(track_name %in%c("HUMBLE.", "White Iverson"))# Get their track IDs for matchinganchor_ids <- anchor_songs$track_id
Show the code
# Find songs that appear on the same playlistsplaylist_overlap_songs <- combined_data %>%filter(track_id %in% anchor_ids) %>%distinct(playlist_id) %>%inner_join(combined_data, by ="playlist_id") %>%filter(!track_id %in% anchor_ids) %>%distinct(track_id, track_name, artist_name)
Show the code
library(dplyr)# Step 1: Extract key and tempo of anchor songs and rename for simplicityanchor_keys_tempos <- anchor_songs %>%distinct(key.x, tempo.x) %>%rename(key = key.x, tempo = tempo.x)# Step 2: Filter songs in combined_data with similar key and tempo ± 5harmonic_matches <- combined_data %>%filter( key.x %in% anchor_keys_tempos$key, tempo.x >=min(anchor_keys_tempos$tempo) -5, tempo.x <=max(anchor_keys_tempos$tempo) +5,!track_id %in% anchor_ids ) %>%distinct(track_id, track_name, artist_name)
Step4
Show the code
# Filter songs by same artistsame_artist_songs <- combined_data %>%filter(artist_name %in% anchor_songs$artist_name) %>%filter(!track_id %in% anchor_ids) %>%distinct(track_id, track_name, artist_name)
playlist_title <-"Neon Nights: A Vibe-Driven Ride"
Step 3
Show the code
# Add track order again just in caseultimate_playlist <- final_candidates %>%mutate(track_order =row_number())# Now plot valence (emotional arc)arc <-ggplot(ultimate_playlist, aes(x =factor(track_order), y = valence, group =1)) +geom_line(color ="#00FF00", size =1.2) +geom_point(color ="white", size =3) +labs(title ="Emotional Arc: Valence Across Playlist",x ="Track Order",y ="Valence (Positivity)" ) +theme_minimal(base_size =14) +theme(plot.background =element_rect(fill ="#121212", color =NA),panel.background =element_rect(fill ="#121212", color =NA),panel.grid.major =element_line(color ="white", linewidth =0.3),axis.text =element_text(color ="#00FF00"),axis.title =element_text(color ="#00FF00"),plot.title =element_text(color ="#00FF00", size =16, face ="bold") )
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Show the code
print (arc)
Title: Echoes of Euphoria Description: “Echoes of Euphoria” is a carefully curated playlist designed to elevate your mood with a mix of upbeat and dreamy tracks. Perfect for moments when you want to feel energized and immersed in an uplifting musical journey. Design Principles:
Balanced valence for emotional shifts
Harmonic flow using similar keys
Features 3 underrated tracks & 2 songs I had never heard before
Anchored on HUMBLE. and White Iverson
Bonus Track 🎞️
For this project, HUMBLE. by Kendrick Lamar serves as my anchor song, capturing the core themes I’m exploring. I also selected Take On Me by a-ha and The Scientist by Coldplay from my ultimate playlist to complement and expand on the project’s mood and message.
Outro 🎶
This project, “Tracks and Stats,” has been an exciting journey where music meets analytics. From ingesting and cleaning Spotify datasets to uncovering trends in popularity, danceability, and playlist composition, we’ve explored the factors that make a song resonate with millions. Through descriptive analysis, visualizations, and inferential modeling, we’ve decoded the rhythm behind the stats.
The result? A curated playlist, ”Echoes of Euphoria,” that harmonizes data-driven insights with personal creativity. This playlist exemplifies how analytics can transform raw data into a cohesive musical narrative, blending tempo, energy, and valence for an unforgettable listening experience.
As we close the loop, “Tracks and Stats” serves not only as a reflection of our love for music but also as a testament to the power of data in shaping experiences that resonate, inspire, and connect.
Keep vibing, keep analyzing, and let the music play!
Echoes of Euphoria
Spotify does not house clean versions for a lot of these songs.